Temporal Coherence and Prediction Decay in TD Learning

نویسندگان

  • Donald F. Beal
  • Martin C. Smith
چکیده

This paper describes improvements to the temporal difference learning method. The standard form of the method has the problem that two control parameters, learning rate and temporal discount, need to be chosen appropriately. These parameters can have a major effect on performance, particularly the learning rate parameter, which affects the stability of the process as well as the number of observations required. Our extension to the algorithm automatically sets and subsequently adjusts these parameters. The learning rate adjustment is based on a new concept we call temporal coherence (TC). The experiments reported here compare the extended algorithm performance with human-chosen parameters and with an earlier method for learning rate adjustment, in a complex game domain. The learning task was that of learning the relative values of pieces, without any init ial domain-specific knowledge, and from self-play only. The results show that the improved method leads to better learning (i.e. faster and less subject to the effects of noise), than the selection of human-chosen values for the control parameters, and a comparison method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluating the TD model of classical conditioning.

The temporal-difference (TD) algorithm from reinforcement learning provides a simple method for incrementally learning predictions of upcoming events. Applied to classical conditioning, TD models suppose that animals learn a real-time prediction of the unconditioned stimulus (US) on the basis of all available conditioned stimuli (CSs). In the TD model, similar to other error-correction models, ...

متن کامل

Can time-based decay explain temporal distinctiveness effects in task switching?

In task switching, extending the response-cue interval (RCI) reduces the switch cost--the detriment to performance when switching compared to repeating tasks. This reduction has been used as evidence for the existence of task-set decay processes. Recently, this has been challenged by the observation of sequential dependencies on the RCI effect: switch cost is only reduced at longer RCIs when th...

متن کامل

Shifting Attention Using a Temporal Difference Prediction Error and High-Dimensional Input

Research on reinforcement learning has increasingly focused on the role of neuromodulatory systems implicated in associative learning. Formulations of temporal difference (TD) learning have gained a great deal of attention due to the similarity of the TD prediction error and the observed activity of dopamine neurons in the primate midbrain. Recent work has attempted to integrate additional neur...

متن کامل

Kernel Least-Squares Temporal Difference Learning

Kernel methods have attracted many research interests recently since by utilizing Mercer kernels, non-linear and non-parametric versions of conventional supervised or unsupervised learning algorithms can be implemented and usually better generalization abilities can be obtained. However, kernel methods in reinforcement learning have not been popularly studied in the literature. In this paper, w...

متن کامل

TD(λ) Networks: Temporal-Difference Networks with Eligibility Traces

Temporal-difference (TD) networks have been introduced as a formalism for expressing and learning grounded world knowledge in a predictive form (Sutton & Tanner, 2005). Like conventional TD(0) methods, the learning algorithm for TD networks uses 1-step backups to train prediction units about future events. In conventional TD learning, the TD(λ) algorithm is often used to do more general multi-s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999